These notes will outline:
There have been several different iterations of how to create maps over the years. We will be using sf and geom_sf. sf stands for simple feature. The sf packages is a set of tools for working with geospacital vectors (points, lines, polygons, etc). We won’t be talking about the specifics of these too much but if you are into GIS and want to learn more about how these objects are constructed this blog post might help: https://www.jessesadler.com/post/simple-feature-objects/
library(tidyverse)
#you will need to run install.packages("package name") if this is the first time you have used these packages
library(sf)
library(maps)
For our first example, we will be working with a dataset of North Carolina that is built in to the sf package.
#to pull in the North Caroline Dataset
demo(nc, ask = FALSE, echo = FALSE)
You should notice that the nc dataset is now saved in your R environment. This dataset contains information about Sudden Infant Death Syndrome (SIDS) for North Carolina counties, over two time periods (1974-78 and 1979-84). Let’s take a look at that dataset.
Each row represents a county in North Carolina. This data frame contains the following columns:
AREA County polygon areas in degree unitsPERIMETER County polygon perimeters in degree unitsCNTY_ Internal county IDNAME County namesFIPS County IDFIPSNO County IDCRESS_ID Cressie papers IDBIR74 births, 1974-78SID74 SID deaths, 1974-78NWBIR74 non-white births, 1974-78BIR79 births, 1979-84SID79 SID deaths, 1979-84NWBIR79 non-white births, 1979-84geom information needed to plot the map for each countyLet’s begin by simply plotting the map.
ggplot() +
geom_sf(data = nc)
Let’s pretty it up.
ggplot() +
geom_sf(data = nc, col="black", fill="darkgrey") +
theme_light() +
ggtitle("North Carolina Counties")
#Remember the other theme options: theme_grey(), theme_bw(), theme_dark(), theme_minimal(), theme_classic(), theme_void()
Suppose we want to shade each of these counties, based on the number of births in 1974.
ggplot() +
geom_sf(data = nc, aes(fill = BIR74), col = "black") +
theme_light()+
ggtitle("North Carolina, Birth Rates in 1974")
Here are some options to customize the plot that you might be interested in.
ggplot() +
geom_sf(data=nc) +
aes(fill = BIR74) +
geom_sf_text(data = nc[nc$BIR74 > 15000,], aes(label = NAME), fontface="bold") + # add name labels
scale_fill_gradientn(colors = c('red','yellow')) + #customize colors
theme_light() +
ggtitle("3 Counties in North Carolina, Birth Rates in 1974")
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
## not give correct results for longitude/latitude data
There are different color palettes available in the RColorBrewer package.
library(RColorBrewer)
display.brewer.all()
A note about customizing colors:
RdYlGr is the worst…)display.brewer.all(type="all", colorblindFriendly=TRUE)
You can break up sequential or diverging palettes into a fixed number of colors using brewer.pal.
brewer.pal(11, "Spectral")
## [1] "#9E0142" "#D53E4F" "#F46D43" "#FDAE61" "#FEE08B" "#FFFFBF" "#E6F598"
## [8] "#ABDDA4" "#66C2A5" "#3288BD" "#5E4FA2"
display.brewer.pal(11, "Spectral")
display.brewer.pal(8, "Spectral")
display.brewer.pal(4, "Spectral")
For example,
ggplot() +
geom_sf(data=nc) +
aes(fill = BIR74) +
ggtitle("North Carolina, Birth Rates in 1974") +
scale_fill_gradientn(colors = brewer.pal(8, "Spectral") ) + #customize colors
theme_light()
ggplot() +
geom_sf(data=nc) +
aes(fill = BIR74) +
ggtitle("North Carolina, Birth Rates in 1974") +
scale_fill_gradientn(colors = brewer.pal(11, "Spectral")[5:11] ) #customize colors
ggplot() +
geom_sf(data=nc) +
aes(fill = BIR74) +
ggtitle("North Carolina, Birth Rates in 1974") +
scale_fill_gradientn(colors = c("yellow", "orange", "red") ) #customize colors
rnaturalearthRnaturalearth contains information to map all the countries in the world (among other things). To get out this country data we use ne_countries. We can also use ne_states to get state data. To plot the map using this package, we will use geom_sf.
library(rnaturalearth)
library(rnaturalearthdata)
To download the world data from the rnaturalearth package.
#don't worry too much about where each of the arguements come from (unless you want to)
world <- ne_countries(scale = 'medium', type = 'map_units',returnclass = 'sf')
A plot of the world…
ggplot() +
geom_sf(data = world) +
theme_light() +
ggtitle("World Map, using rnaturalearth")
You can subset the data by continent or by country.
canada <- world %>%
filter(name == 'Canada')
ggplot() +
geom_sf(data = canada) +
ggtitle("Canada") +
theme_light()
A note about projections:
There are lots of different ways to ``flatten" the earth (projections).
coord_sf will help us change the coordinate system, which includes both projection and extent of the map:
maps uses WGS84 as the coordinate reference system (longitude/latitude). Using the argument crs, it is possible to override this setting. (more later)xlim and ylim allow us to crop the map to any area of the world we like. Note that xlim is the latitude and ylim is the longitude.ggplot() +
geom_sf(data = canada) +
coord_sf(crs=st_crs(3347)) ## change to Lambert Projection
ggtitle("Canada") +
theme_light()
## NULL
europe <- world %>%
filter(continent == 'Europe')
ggplot() +
geom_sf(data = europe) +
ggtitle("Europe") +
theme_light()
ggplot() +
geom_sf(data = europe) +
theme_light() +
ggtitle("Europe, Cropped") +
coord_sf(xlim = c(-30,60), ylim = c(35,70))
usa <- world %>%
filter(name == 'United States')
ggplot() +
geom_sf(data = usa) +
ggtitle("USA") +
theme_light()
ggplot() +
geom_sf(data = usa) +
theme_light() +
coord_sf(xlim = c(-130, -65), ylim = c(20, 50)) +
ggtitle("Continental USA")
library(nycflights13)
Note that this library contains a dataset called airports which lists all the airports and their latitudes and longitudes.
head(airports)
## # A tibble: 6 x 8
## faa name lat lon alt tz dst tzone
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 04G Lansdowne Airport 41.1 -80.6 1044 -5 A America/New…
## 2 06A Moton Field Municipal A… 32.5 -85.7 264 -6 A America/Chi…
## 3 06C Schaumburg Regional 42.0 -88.1 801 -6 A America/Chi…
## 4 06N Randall Airport 41.4 -74.4 523 -5 A America/New…
## 5 09J Jekyll Island Airport 31.1 -81.4 11 -5 A America/New…
## 6 0A9 Elizabethton Municipal … 36.4 -82.2 1593 -5 A America/New…
Let’s try plotting each of the airports on the map.
ggplot() +
geom_sf(data = usa) +
theme_light() +
geom_point(data = airports, aes(x = lon, y = lat), pch = 19, size=0.5) +
coord_sf(xlim = c(-130, -65),
ylim = c(20, 50))
Let’s try color coding them by the time zone they are in.
ggplot() +
geom_sf(data = usa) +
theme_light() +
geom_point(data = airports, aes(x = lon, y = lat, col=tzone), pch = 19, size=0.5) +
coord_sf(xlim = c(-130, -65),
ylim = c(20, 50))
mapsThe maps package contains several built in maps: world (for all countries in the world), france, italy, nz, usa, state (usa state boundaries), and county (usa counties).
To reference each map you use map_data("mapname"). Rather than use geom_sf map data in this format needs to be mapped using geom_polygon.
world_map <- map_data("world")
ggplot(world_map, aes(long, lat)) +
geom_polygon(aes(group=group),fill="lightgrey", col="black") +
theme_light()
You can subset the data from particular country/counties.
Load the data:
usa <- map_data("world", region = "usa")
Plot the data:
ggplot(usa, aes(long, lat)) +
geom_polygon(aes(group=group), fill="darkgrey",col="black") +
coord_fixed(ratio = 1) +
theme_light() +
theme(legend.position = "none") +
coord_sf(xlim = c(-130, -65), ylim = c(20, 50))
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Note the aes(group=group) option. This is SUPER IMPORTANT, so R knows which things to connect together. Otherwise, the graph would look like this:
ggplot(usa, aes(long, lat)) +
geom_polygon(fill="darkgrey",col="black") +
coord_fixed(ratio = 1) + #maintain aspect ratio
theme_light() +
theme(legend.position = "none") +
coord_sf(xlim = c(-130, -65), ylim = c(20, 50))
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Also when using maps the coord_fixed(ratio = 1) is also super important so the aspect ratio is maintained (a country may appear super stretched or squished). This isn’t too bad for the USA, but for a longer skinnier country like Italy this can get pretty wonky.
italy <- map_data("world", region = "Italy")
# ASPECT RATIO NOT MAINTAINED
ggplot(italy, aes(long, lat)) +
geom_polygon(aes(group=group), fill="darkgrey",col="black") +
theme_light() +
theme(legend.position = "none") +
ggtitle("Italy - Aspect Ratio Not Maintained (not good)")
# ASPECT RATIO MAINTAINED
ggplot(italy, aes(long, lat)) +
geom_polygon(aes(group=group), fill="darkgrey",col="black") +
coord_fixed(ratio = 1) +
theme_light() +
theme(legend.position = "none") +
ggtitle("Italy - Aspect Ratio Maintained (better)")
Create a dataset with most European countries:
some.eu.countries <- c(
"Portugal", "Spain", "France", "Switzerland", "Germany",
"Austria", "Belgium", "UK", "Netherlands",
"Denmark", "Poland", "Italy",
"Croatia", "Slovenia", "Hungary", "Slovakia",
"Czech republic"
)
europe <- map_data("world", region = some.eu.countries)
ggplot(europe, aes(long, lat)) +
geom_polygon(aes( group=group, fill=region), col="black") +
coord_fixed(ratio = 1) +
theme_light() +
theme(legend.position = "none")
# Prepare the USArrests data
arrests <- USArrests
arrests$region <- tolower(rownames(USArrests))
head(arrests)
## Murder Assault UrbanPop Rape region
## Alabama 13.2 236 58 21.2 alabama
## Alaska 10.0 263 48 44.5 alaska
## Arizona 8.1 294 80 31.0 arizona
## Arkansas 8.8 190 50 19.5 arkansas
## California 9.0 276 91 40.6 california
## Colorado 7.9 204 78 38.7 colorado
# Retrieve the states map data and merge with crime data
states_map <- map_data("state")
arrests_map <- left_join(states_map, arrests, by = "region") #we will talk alot more about joins in one of the coming modules
head(arrests_map)
## long lat group order region subregion Murder Assault UrbanPop
## 1 -87.46201 30.38968 1 1 alabama <NA> 13.2 236 58
## 2 -87.48493 30.37249 1 2 alabama <NA> 13.2 236 58
## 3 -87.52503 30.37249 1 3 alabama <NA> 13.2 236 58
## 4 -87.53076 30.33239 1 4 alabama <NA> 13.2 236 58
## 5 -87.57087 30.32665 1 5 alabama <NA> 13.2 236 58
## 6 -87.58806 30.32665 1 6 alabama <NA> 13.2 236 58
## Rape
## 1 21.2
## 2 21.2
## 3 21.2
## 4 21.2
## 5 21.2
## 6 21.2
# Create the map
ggplot(arrests_map, aes(long, lat, group = group)) +
geom_polygon(aes(fill = Murder), color = "black") +
coord_fixed(ratio = 1)+
theme_light() +
ggtitle("USA, Murders By State")
There are all kinds of packages for working with maps in different ways that we didn’t discuss. There are all kinds of things you can do with maps that we haven’t discussed - plotting roads, water (lakes, rivers…), railways, etc.
You may want to check out: